Workflow

Single-cell transcriptomics of cells infected with influenza virions carrying barcodes. This experiment allows accurate detection of the number of unique virions infecting each cell and its resulting impact on the transcriptome. The single-cell transcriptomics were performed using 10X Chromium.

The basic steps in the analysis are as follows:

Detailed software versions can be found under Rules.

Results

<<<<<<< HEAD
File Size Description Job properties
<<<<<<< HEAD fastq10x_qc_analysis.html ======= fastq10x_qc_analysis.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 358.6 kB

Analysis of quality-control statistics from the generation of the 10X FASTQ files using cellranger mkfastq, in the form of an HTML rendering of a Jupyter notebook.

Rulefastq10x_qc_analysis
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD <<<<<<< HEAD
File Size Description Job properties
<<<<<<< HEAD align_fastq10x_summary.html ======= align_fastq10x_summary.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 446.6 kB

Statistics from the STARsolo alignments of the 10X Illumina FASTQ files in the form of an HTML rendering of a Jupyter notebook.

Rulealign_fastq10x_summary
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD fastq10x_transcript_coverage.html ======= fastq10x_transcript_coverage.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 608.7 kB

Coverage plots for some selected transcript in the aligned 10X Illumina reads, in the form of an HTML rendering of a Jupyter notebook.

Rulefastq10x_transcript_coverage
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD ======= hashing_trial2_analyze_cell_gene_matrix.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd <<<<<<< HEAD <<<<<<< HEAD
File Size Description Job properties
<<<<<<< HEAD hashing_trial1_analyze_cell_gene_matrix.html ======= hashing_trial1_analyze_cell_gene_matrix.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 977.9 kB

Analysis of the cell-gene matrix for hashing_trial1 in the form of a HTML rendering of a Jupyter notebook.

Ruleanalyze_cell_gene_matrix
Wildcardssample10x=hashing_trial1
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD hashing_trial2_analyze_cell_gene_matrix.html 1.0 MB 641.8 kB

Analysis of the cell-gene matrix for hashing_trial2 in the form of a HTML rendering of a Jupyter notebook.

Ruleanalyze_cell_gene_matrix
Wildcardssample10x=hashing_trial2
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD hashing_wt_rapidpilot_analyze_cell_gene_matrix.html ======= hashing_wt_rapidpilot_analyze_cell_gene_matrix.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 993.4 kB

Analysis of the cell-gene matrix for hashing_wt_rapidpilot in the form of a HTML rendering of a Jupyter notebook.

Ruleanalyze_cell_gene_matrix
Wildcardssample10x=hashing_wt_rapidpilot
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD <<<<<<< HEAD <<<<<<< HEAD <<<<<<< HEAD <<<<<<< HEAD <<<<<<< HEAD <<<<<<< HEAD <<<<<<< HEAD
File Size Description Job properties
<<<<<<< HEAD count_viralbc_fastq10x-hashing_trial1.html ======= count_viralbc_fastq10x-hashing_trial1.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 347.0 kB

Counting of viral barcodes for hashing_trial1, in the form of an HTML rendering of a Jupyter notebook .

Rulecount_viralbc_fastq10x
Wildcardssample10x=hashing_trial1
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD count_viralbc_fastq10x-hashing_trial2.html ======= count_viralbc_fastq10x-hashing_trial2.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 352.7 kB

Counting of viral barcodes for hashing_trial2, in the form of an HTML rendering of a Jupyter notebook .

Rulecount_viralbc_fastq10x
Wildcardssample10x=hashing_trial2
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD count_viralbc_fastq10x-hashing_wt_rapidpilot.html ======= count_viralbc_fastq10x-hashing_wt_rapidpilot.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 344.0 kB

Counting of viral barcodes for hashing_wt_rapidpilot, in the form of an HTML rendering of a Jupyter notebook .

Rulecount_viralbc_fastq10x
Wildcardssample10x=hashing_wt_rapidpilot
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD count_viraltags_fastq10x-hashing_trial1.html ======= count_viraltags_fastq10x-hashing_trial1.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 378.6 kB

Counting of the viral tags for hashing_trial1 in the form of an HTML rendering of a Jupyter notebook.

Rulecount_viraltags_fastq10x
Wildcardssample10x=hashing_trial1
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD count_viraltags_fastq10x-hashing_trial2.html ======= count_viraltags_fastq10x-hashing_trial2.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 374.3 kB

Counting of the viral tags for hashing_trial2 in the form of an HTML rendering of a Jupyter notebook.

Rulecount_viraltags_fastq10x
Wildcardssample10x=hashing_trial2
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD count_viraltags_fastq10x-hashing_wt_rapidpilot.html ======= count_viraltags_fastq10x-hashing_wt_rapidpilot.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 375.9 kB

Counting of the viral tags for hashing_wt_rapidpilot in the form of an HTML rendering of a Jupyter notebook.

Rulecount_viraltags_fastq10x
Wildcardssample10x=hashing_wt_rapidpilot
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD gap_analysis.html ======= gap_analysis.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 1.1 MB

Analysis of reads with gaps in the aligned 10X Illumina FASTQ reads in the form of an HTML rendering of a Jupyter notebook.

Ruleanalyze_gaps
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
<<<<<<< HEAD viral_fastq10x_coverage.html ======= viral_fastq10x_coverage.html >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd 816.8 kB

Analysis of the coverage of the viral genes (including viral tags and viral barcodes) in the aligned 10X Illumina FASTQ reads in the form of an HTML rendering of a Jupyter notebook.

Ruleviral_fastq10x_coverage
======= >>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd

Statistics

If the workflow has been executed in cluster/cloud, runtimes include the waiting time in the queue.

Configuration

File Code
config.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# YAML configuration file for the analysis

# max CPUs used by any rules
max_cpus: 16

# file specifying 10X Illumina runs
illumina_runs_10x: data/illumina_runs_10x.csv

# output directories
fastq10x_dir: results/fastq10x  # FASTQ files & QC stats for 10X Illumina runs
mkfastq10x_dir: results/fastq10x/mkfastq_output  # `cellranger mkfastq` output
genome_dir: results/genomes  # location of downloaded genomes and annotations
refgenome: results/genomes/refgenome  # STAR reference genome directory
aligned_fastq10x_dir: results/aligned_fastq10x  # aligned 10X Illumina reads
viral_fastq10x_dir: results/viral_fastq10x  # viral tags / barcodes in 10X reads
analysis_dir: results/analysis  # fine-grained analyses

# cellular genome and GTF ftp sites
cell_genome_ftp: ftp://ftp.ensembl.org/pub/release-98/fasta/canis_familiaris/dna/Canis_familiaris.CanFam3.1.dna.toplevel.fa.gz
cell_gtf_ftp: ftp://ftp.ensembl.org/pub/release-98/gtf/canis_familiaris/Canis_familiaris.CanFam3.1.98.gtf.gz

# viral genome (FASTA), GTF, and Genbank file locations
viral_genome: data/flu_sequences/flu-CA09.fasta
viral_gtf: data/flu_sequences/flu-CA09.gtf
viral_genbank: data/flu_sequences/flu-CA09.gb

# file giving nucleotide identities at viral tag sites
viraltag_identities: data/flu_sequences/flu-CA09_viral_tags.yaml

# STAR alignment parameters. These settings reduce the penalty for
# non-canonical splice sites, which is probably bad for mapping cellular
# reads but is good for mapping viral reads which will have deletions
# not corresponding to splice sites.
barcode_total_length: 28 #length of UMI + CB

scoreGapNoncan: -4
scoreGapGCAG: -4
scoreGapATAC: -4

# URL location of 10X barcode whitelist: **this is for the v3 kit**
cb_whitelist_10x_url: https://github.com/10XGenomics/cellranger/raw/master/lib/python/cellranger/barcodes/3M-february-2018.txt.gz
cb_whitelist_10x: results/aligned_fastq10x/cb_whitelist_10x.txt

cb_len_10x: 16  # length of 10X cell barcode
umi_len_10x: 12  # length of 10X UMI: **this is for the v3 kit**

expect_ncells: 6000  # expected cells per 10X run, for "knee" cell calling

Rules

Rule Jobs Output Singularity Conda environment Code
fastq10x_qc_analysis 1 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/fastq10x/fastq10x_qc_analysis.ipynb
  • results/fastq10x/fastq10x_qc_analysis.html
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
source
align_fastq10x_summary 1 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/aligned_fastq10x/align_fastq10x_summary.ipynb
  • results/aligned_fastq10x/align_fastq10x_summary.html
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
source
fastq10x_transcript_coverage 1 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/aligned_fastq10x/fastq10x_transcript_coverage.ipynb
  • results/aligned_fastq10x/fastq10x_transcript_coverage.html
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
source
viral_fastq10x_coverage 1 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/viral_fastq10x/viraltag_locs.csv
  • results/viral_fastq10x/viralbc_locs.csv
  • results/viral_fastq10x/viral_fastq10x_coverage.ipynb
  • results/viral_fastq10x/viral_fastq10x_coverage.html
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
source
analyze_gaps 1 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/viral_fastq10x/gap_analysis.ipynb
  • results/viral_fastq10x/gap_analysis.html
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
source
count_viraltags_fastq10x 3 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/viral_fastq10x/count_viraltags_fastq10x-hashing_wt_rapidpilot.ipynb
  • results/viral_fastq10x/count_viraltags_fastq10x-hashing_wt_rapidpilot.html
  • results/viral_fastq10x/viraltag_counts_hashing_wt_rapidpilot.csv
  • results/viral_fastq10x/count_viraltags_fastq10x-hashing_trial1.ipynb
  • results/viral_fastq10x/count_viraltags_fastq10x-hashing_trial1.html
  • results/viral_fastq10x/viraltag_counts_hashing_trial1.csv
  • results/viral_fastq10x/count_viraltags_fastq10x-hashing_trial2.ipynb
  • results/viral_fastq10x/count_viraltags_fastq10x-hashing_trial2.html
  • results/viral_fastq10x/viraltag_counts_hashing_trial2.csv
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
source
count_viralbc_fastq10x 3 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/viral_fastq10x/count_viralbc_fastq10x-hashing_wt_rapidpilot.ipynb
  • results/viral_fastq10x/count_viralbc_fastq10x-hashing_wt_rapidpilot.html
  • results/viral_fastq10x/viralbc_counts_hashing_wt_rapidpilot.csv
  • results/viral_fastq10x/count_viralbc_fastq10x-hashing_trial1.ipynb
  • results/viral_fastq10x/count_viralbc_fastq10x-hashing_trial1.html
  • results/viral_fastq10x/viralbc_counts_hashing_trial1.csv
  • results/viral_fastq10x/count_viralbc_fastq10x-hashing_trial2.ipynb
  • results/viral_fastq10x/count_viralbc_fastq10x-hashing_trial2.html
  • results/viral_fastq10x/viralbc_counts_hashing_trial2.csv
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
source
analyze_cell_gene_matrix 3 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/analysis/hashing_wt_rapidpilot_analyze_cell_gene_matrix.ipynb
  • results/analysis/hashing_wt_rapidpilot_analyze_cell_gene_matrix.html
  • results/analysis/hashing_trial1_analyze_cell_gene_matrix.ipynb
  • results/analysis/hashing_trial1_analyze_cell_gene_matrix.html
  • results/analysis/hashing_trial2_analyze_cell_gene_matrix.ipynb
  • results/analysis/hashing_trial2_analyze_cell_gene_matrix.html
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
source
make_fastq10x 5 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/fastq10x/hashing_wt_rapidpilot-2019-12-03_all_R1.fastq.gz
  • results/fastq10x/hashing_wt_rapidpilot-2019-12-03_all_R2.fastq.gz
  • results/fastq10x/mkfastq_output/hashing_wt_rapidpilot-2019-12-03
  • results/fastq10x/hashing_wt_rapidpilot-2019-12-03_qc_stats.csv
  • _mkfastq_hashing_wt_rapidpilot-2019-12-03.csv
  • __hashing_wt_rapidpilot-2019-12-03.mro
  • results/fastq10x/hashing_trial1-2020-01-16_all_R1.fastq.gz
  • results/fastq10x/hashing_trial1-2020-01-16_all_R2.fastq.gz
  • results/fastq10x/mkfastq_output/hashing_trial1-2020-01-16
  • results/fastq10x/hashing_trial1-2020-01-16_qc_stats.csv
  • _mkfastq_hashing_trial1-2020-01-16.csv
  • __hashing_trial1-2020-01-16.mro
  • results/fastq10x/hashing_trial1-2020-02-18_all_R1.fastq.gz
  • results/fastq10x/hashing_trial1-2020-02-18_all_R2.fastq.gz
  • results/fastq10x/mkfastq_output/hashing_trial1-2020-02-18
  • results/fastq10x/hashing_trial1-2020-02-18_qc_stats.csv
  • _mkfastq_hashing_trial1-2020-02-18.csv
  • __hashing_trial1-2020-02-18.mro
  • results/fastq10x/hashing_trial2-2020-06-02_all_R1.fastq.gz
  • results/fastq10x/hashing_trial2-2020-06-02_all_R2.fastq.gz
  • results/fastq10x/mkfastq_output/hashing_trial2-2020-06-02
  • results/fastq10x/hashing_trial2-2020-06-02_qc_stats.csv
  • _mkfastq_hashing_trial2-2020-06-02.csv
  • __hashing_trial2-2020-06-02.mro
  • results/fastq10x/hashing_trial2-2020-07-01_all_R1.fastq.gz
  • results/fastq10x/hashing_trial2-2020-07-01_all_R2.fastq.gz
  • results/fastq10x/mkfastq_output/hashing_trial2-2020-07-01
  • results/fastq10x/hashing_trial2-2020-07-01_qc_stats.csv
  • _mkfastq_hashing_trial2-2020-07-01.csv
  • __hashing_trial2-2020-07-01.mro
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
source
align_fastq10x 3 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/aligned_fastq10x/hashing_wt_rapidpilot/Solo.out/Gene/Summary.csv
  • results/aligned_fastq10x/hashing_wt_rapidpilot/Solo.out/Gene/UMIperCellSorted.txt
  • results/aligned_fastq10x/hashing_wt_rapidpilot/Solo.out/Gene/filtered/matrix.mtx
  • results/aligned_fastq10x/hashing_wt_rapidpilot/Solo.out/Gene/filtered/features.tsv
  • results/aligned_fastq10x/hashing_wt_rapidpilot/Solo.out/Gene/filtered/barcodes.tsv
  • results/aligned_fastq10x/hashing_wt_rapidpilot/Aligned.sortedByCoord.out.bam
  • results/aligned_fastq10x/hashing_trial1/Solo.out/Gene/Summary.csv
  • results/aligned_fastq10x/hashing_trial1/Solo.out/Gene/UMIperCellSorted.txt
  • results/aligned_fastq10x/hashing_trial1/Solo.out/Gene/filtered/matrix.mtx
  • results/aligned_fastq10x/hashing_trial1/Solo.out/Gene/filtered/features.tsv
  • results/aligned_fastq10x/hashing_trial1/Solo.out/Gene/filtered/barcodes.tsv
  • results/aligned_fastq10x/hashing_trial1/Aligned.sortedByCoord.out.bam
  • results/aligned_fastq10x/hashing_trial2/Solo.out/Gene/Summary.csv
  • results/aligned_fastq10x/hashing_trial2/Solo.out/Gene/UMIperCellSorted.txt
  • results/aligned_fastq10x/hashing_trial2/Solo.out/Gene/filtered/matrix.mtx
  • results/aligned_fastq10x/hashing_trial2/Solo.out/Gene/filtered/features.tsv
  • results/aligned_fastq10x/hashing_trial2/Solo.out/Gene/filtered/barcodes.tsv
  • results/aligned_fastq10x/hashing_trial2/Aligned.sortedByCoord.out.bam
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
source
index_bam 3 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/aligned_fastq10x/hashing_wt_rapidpilot/Aligned.sortedByCoord.out.bam.bai
  • results/aligned_fastq10x/hashing_trial1/Aligned.sortedByCoord.out.bam.bai
  • results/aligned_fastq10x/hashing_trial2/Aligned.sortedByCoord.out.bam.bai
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
1
samtools index {input} {output}
make_refgenome 1 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/genomes/cell_and_virus_gtf.gtf
  • results/genomes/refgenome
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
1
2
3
4
        cat {input.cell_gtf} {input.viral_gtf} > {output.concat_gtf}
        mkdir -p {output.genomeDir}
        STAR --runThreadN {threads}              --runMode genomeGenerate              --genomeDir {output.genomeDir}              --genomeFastaFiles {input.cell_genome} {input.viral_genome}              --sjdbGTFfile {output.concat_gtf}
        
get_cb_whitelist_10x 1 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/aligned_fastq10x/cb_whitelist_10x.txt
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
1
2
3
4
5
6
7
        if [[ {params.url} == *.gz ]]
        then
            wget -O - {params.url} | gunzip -c > {output}
        else
            wget -O - {params.url} > {output}
        fi
        
get_cell_genome 1 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/genomes/cell_genome.fasta
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
1
wget -O - {params.ftp} | gunzip -c > {output}
get_cell_gtf 1 <<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
  • results/genomes/cell_gtf.gtf
<<<<<<< HEAD
=======
>>>>>>> 10a41aac89852583006e92b77f0d4e2f50efd4dd
1
wget -O - {params.ftp} | gunzip -c > {output}